Two-Sample Tests for Large Random Graphs Using Network Statistics
نویسندگان
چکیده
We consider a two-sample hypothesis testing problem, where the distributions are defined on the space of undirected graphs, and one has access to only one observation from each model. A motivating example for this problem is comparing the friendship networks on Facebook and LinkedIn. The practical approach to such problems is to compare the networks based on certain network statistics. In this paper, we present a general principle for two-sample hypothesis testing in such scenarios without making any assumption about the network generation process. The main contribution of the paper is a general formulation of the problem based on concentration of network statistics, and consequently, a consistent two-sample test that arises as the natural solution for this problem. We also show that the proposed test is minimax optimal for certain network statistics.
منابع مشابه
Sampling from social networks’s graph based on topological properties and bee colony algorithm
In recent years, the sampling problem in massive graphs of social networks has attracted much attention for fast analyzing a small and good sample instead of a huge network. Many algorithms have been proposed for sampling of social network’ graph. The purpose of these algorithms is to create a sample that is approximately similar to the original network’s graph in terms of properties such as de...
متن کاملStatistical inference on attributed random graphs: Fusion of graph features and content
Many problems can be cast as statistical inference on an attributed random graph. Our motivation is change detection in communication graphs. We prove that tests based on a fusion of graph-derived and content-derived metadata can be more powerful than those based on graph or content features alone. For some basic attributed random graphmodels, we derive fusion tests from the likelihood ratio. W...
متن کاملA General Framework for Estimating Graphlet Statistics via Random Walk
Graphlets are induced subgraph patterns and have been frequently applied to characterize the local topology structures of graphs across various domains, e.g., online social networks (OSNs) and biological networks. Discovering and computing graphlet statistics are highly challenging. First, the massive size of real-world graphs makes the exact computation of graphlets extremely expensive. Second...
متن کاملConsistent Nonparametric Tests of Independence
Three simple and explicit procedures for testing the independence of two multi-dimensional random variables are described. Two of the associated test statistics (L1, log-likelihood) are defined when the empirical distribution of the variables is restricted to finite partitions. A third test statistic is defined as a kernel-based independence measure. Two kinds of tests are provided. Distributio...
متن کاملAn Ant Colony Optimization Algorithm for Network Vulnerability Analysis
Intruders often combine exploits against multiple vulnerabilities in order to break into the system. Each attack scenario is a sequence of exploits launched by an intruder that leads to an undesirable state such as access to a database, service disruption, etc. The collection of possible attack scenarios in a computer network can be represented by a directed graph, called network attack gra...
متن کامل